[LinalgExt] Added toggle for using useExp2 for onlineAttention Decomposition by keshavvinayak01 · Pull Request #22778 · iree-org/iree

keshavvinayak01 · 2025-11-27T05:58:09Z

Following the discussion from #22441

Depending on the backend, certain computations may benefit from directly using exp instead of exp2, since there might be accuracy losses due to FP-reassociation. It's helpful to add flag incase the user tracks losses to this particular computation and might favour directly using exp.

The use_exp2 flag is mostly unused in dialect conversions and passes, I presume it's used as a KernelOption. The changes here will not modify the default behavior.

…ionOp -> LinalgExt::AttentionOp Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

MaheshRavishankar

I think it would be simpler to just add an optional attribute to the op itself to use exp2 for decomposition. This doesnt need to be a separate attribute that is not part of the op definition.

keshavvinayak01 · 2025-12-08T10:41:16Z

I think it would be simpler to just add an optional attribute to the op itself to use exp2 for decomposition. This doesnt need to be a separate attribute that is not part of the op definition.

I was going to do that, but then I saw

iree/compiler/src/iree/compiler/Codegen/Common/test/strip_compilation_info.mlir

Line 101 in 70b2b45

    
           %result = iree_linalg_ext.attention {decomposition_config = {pv_attrs = {x}, qk_attrs = {y}, use_exp2}, indexing_maps = [#map, #map1, #map2, #map3, #map4], compilation_info = #compilation} ins(%arg0, %arg1, %arg2, %arg3 : tensor<2x10x6x4xf16>, tensor<2x10x4x4xf16>, tensor<2x10x4x4xf16>, f16) outs(%init : tensor<2x10x6x4xf16>) {

Where it's part of the decomposition config itself. So I thought I'd better refine the attribute and continue using it. Making it an optional attribute of the op itself might introduce redundancy since this already exists @MaheshRavishankar ?

MaheshRavishankar · 2025-12-10T19:21:17Z

I think it would be simpler to just add an optional attribute to the op itself to use exp2 for decomposition. This doesnt need to be a separate attribute that is not part of the op definition.

I was going to do that, but then I saw

iree/compiler/src/iree/compiler/Codegen/Common/test/strip_compilation_info.mlir

Line 101 in 70b2b45

%result = iree_linalg_ext.attention {decomposition_config = {pv_attrs = {x}, qk_attrs = {y}, use_exp2}, indexing_maps = [#map, #map1, #map2, #map3, #map4], compilation_info = #compilation} ins(%arg0, %arg1, %arg2, %arg3 : tensor<2x10x6x4xf16>, tensor<2x10x4x4xf16>, tensor<2x10x4x4xf16>, f16) outs(%init : tensor<2x10x6x4xf16>) {

Where it's part of the decomposition config itself. So I thought I'd better refine the attribute and continue using it. Making it an optional attribute of the op itself might introduce redundancy since this already exists @MaheshRavishankar ?

I dont know the history of that, but we probably need to drop the old usage and just add an optional attribute here. @Groverkss comments?

Groverkss · 2025-12-15T10:20:24Z

I think it would be simpler to just add an optional attribute to the op itself to use exp2 for decomposition. This doesnt need to be a separate attribute that is not part of the op definition.

I was going to do that, but then I saw

iree/compiler/src/iree/compiler/Codegen/Common/test/strip_compilation_info.mlir

Line 101 in 70b2b45

%result = iree_linalg_ext.attention {decomposition_config = {pv_attrs = {x}, qk_attrs = {y}, use_exp2}, indexing_maps = [#map, #map1, #map2, #map3, #map4], compilation_info = #compilation} ins(%arg0, %arg1, %arg2, %arg3 : tensor<2x10x6x4xf16>, tensor<2x10x4x4xf16>, tensor<2x10x4x4xf16>, f16) outs(%init : tensor<2x10x6x4xf16>) {

Where it's part of the decomposition config itself. So I thought I'd better refine the attribute and continue using it. Making it an optional attribute of the op itself might introduce redundancy since this already exists @MaheshRavishankar ?

I dont know the history of that, but we probably need to drop the old usage and just add an optional attribute here. @Groverkss comments?

I think it's okay to have a decomposition config dictionary and add these attributes to it. Attention op needs multiple configuration options so it's useful to have a dictionary.

compiler/src/iree/compiler/Dialect/LinalgExt/IR/test/decompose_aggregate_op.mlir

compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/Passes.td

compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/DecomposeAttention.cpp

compiler/src/iree/compiler/Dialect/LinalgExt/IR/LinalgExtOps.td

compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/Passes.td

Groverkss · 2025-12-17T11:20:20Z

@MaheshRavishankar Can you have a look at this again?

keshavvinayak01

Let's re-run CI on this and get it merged? @Groverkss

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

…t-torch-rewrite-flexattention'

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

These should run on shark75-ci.

…pass SwizzleHintOps (#23084) This is the second of a series of PRs that together implement support in IREE for XOR swizzling through the SwizzleHintOp. There are four PRs that need to be merged: 1) Allow rank > 1 swizzle hint op operands and add a pass to flatten swizzle hint allocs. 2) Add patterns which can fold reshapes and `extract_slice` ops into empty ops through swizzle hint ops. 3) Add swizzle hint attribute to be set in `lowering_config` and consumed in `GPUPromoteMatmulOperandsPass`. 4) Update `LLVMGPUSelectLoweringStrategy` Pass to set xor swizzles for MXFP4 GEMMs. This is PR 2, which does two things: - duplicates folding patterns for tensor.empty op from upstream llvm-project in IREE, but with support for swizzle hint ops. - Adds these patterns to the `GPUApplyTilingPass`. --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

This is the first of a series of PRs that together implement support in IREE for XOR swizzling through the SwizzleHintOp. There are four PRs that need to be merged: 1) Allow rank > 1 swizzle hint op operands and add a pass to flatten swizzle hint allocs. 2) Add patterns which can fold reshapes and `extract_slice` ops into empty ops through swizzle hint ops. 3) Add swizzle hint attribute to be set in `lowering_config` and consumed in `GPUPromoteMatmulOperandsPass`. 4) Update `LLVMGPUSelectLoweringStrategy` Pass to set xor swizzles for MXFP4 GEMMs. This is PR 1, which does three things: - Loosens the restriction on SwizzleHintOp inputs needing to be a Shaped type of rank 1. We do this because things are a lot simpler during tiling when you can fold arbitrary shapes into the swizzle hint op and then flatten later. - Introduces a pass to flatten allocs associated to `SwizzleHintOps`. - Moves the verification of flatness of swizzle hint ops to the `ResolveSwizzleHintOps` pass, prior to removal. --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

You can enable it with `-DIREE_REVERSE_ITERATION=On`. I found 4 failing tests but there might be more non-determinism. ``` iree/compiler/Dialect/Stream/Transforms/test/automatic_reference_counting.mlir iree/compiler/Dialect/Stream/Transforms/test/automatic_reference_counting_scf.mlir iree/compiler/Dialect/Util/Transforms/test/hoist_into_globals.mlir iree/compiler/GlobalOptimization/test/hoist_into_globals.mlir ``` Once fixed, I plan to enable this in CI.

Pass booleans instead of `nullptr`; the former confuses some compilers because both `bool` and `Value` are constructible with `nullptr`. Also clean up comments and needlessly complicated code just above. Fixes: #23164

… modified (#23168) * Updates torch_ops configuration file to skip running some tests (new tests added without golden_value and a new failing that was not skipped). * Adds a new rule to configure_ci.py to run torch tests whenever configuration files are modified. This is because otherwise one needs to remember to add ci-extra to test relevant tests. (onnx and sharktank are not included here since they are always run on pre-submit)

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Adds a pass to remove iree_codegen.index_hint operations. The pass unconditionally drops all index_hint ops, and should be used once the compiler is done using them for optimizations. The ops can get in the way of later optimizations, so this pass should be used to drop them once they are no longer needed. The pass is not added to any pipelines, because we are not generating index_hint ops anywhere yet, but this pass will be added later once index_hints start to be used. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com>

Enable tests that were previously excluded but now pass: ROCM/HIP (tests/e2e/linalg): - conv2d, narrow_n_matmuls, subbyte_to_fp, fp_to_subbyte, fp4_f32_conversion, index VMVX (tests/e2e/linalg): - argmax, index VMVX (tests/e2e/linalg_ext_ops): - attention Vulkan (tests/e2e/linalg): - argmax, index Vulkan (tests/e2e/linalg_ext_ops): - map_gather, map_scatter, top-k Vulkan (tests/e2e/stablehlo_ops): - reverse Below is the additional testing time on my machine (using gfx1100): ``` ● Test execution times for newly enabled tests: ┌──────────┬───────┬────────────┐ │ Backend │ Tests │ Total Time │ ├──────────┼───────┼────────────┤ │ ROCM/HIP │ 6 │ 3.06 sec │ ├──────────┼───────┼────────────┤ │ VMVX │ 3 │ 0.28 sec │ ├──────────┼───────┼────────────┤ │ Vulkan │ 6 │ 0.58 sec │ ├──────────┼───────┼────────────┤ │ Total │ 15 │ ~3.9 sec │ └──────────┴───────┴────────────┘ Individual test breakdown: ROCM/HIP: - conv2d: 0.28s - fp4_f32_conversion: 0.39s - fp_to_subbyte: 0.43s - index: 0.27s - narrow_n_matmuls: 0.97s - subbyte_to_fp: 0.72s VMVX: - argmax: 0.04s - index: 0.04s - attention: 0.20s Vulkan: - argmax: 0.05s - index: 0.05s - map_gather: 0.13s - map_scatter: 0.12s - top-k: 0.19s - reverse: 0.05s All tests are fast (under 1 second each). The slowest is narrow_n_matmuls on ROCM at ~1 second. ``` Signed-off-by: hanhanW <hanhan0912@gmail.com>

Injects iree_codegen.index_hint ops on offsets in the populateOperandOffsetsSizesStrides functions for MMAAttrs. We inject the hints here, because the semantic information about the offsets is readily available, and can easily carry down to the later optimization pass that converts loads into transpose loads using these hints. These hints are intended for load to transpose load optimizations, but they are set unconditionally regardless of transpositions for simplicity. The later optimization pass is responsible for determining when the loads are transposed, since it is more explicit at that point. The hint ops will be dropped right after LLVMGPULowerExecutableTarget, since at that point the index_hint ops should already have been used. Currently, the pass that consumes these hint ops is not enabled, so the hint ops will be doing nothing until the pass is added. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com>

) Using #23161

I don't want to add too many CI workflows, so adding together with ubsan.

This is hard to test for because only the (dynamic) host feature list is unordered, unlike features for a specific target, and we can't assume a specific host in tests.

* Use `llvm::IsaPred<T>` instead of lambdas where possible * `!any_of` --> `none_of`

MaheshRavishankar · 2026-01-20T16:22:03Z

What happened here. Why did you close this?

keshavvinayak01 · 2026-01-20T16:28:33Z

I was trying to rebase and push to trigger CI, but the git history got messed up. So I re-opened it #23211

hanhanW · 2026-01-20T19:30:48Z

Next time, you can get to #23211 branch and run git checkout -B this-branch. It will overwrite the current branch history; you can re-open this PR.

Merging PR like that is introducing overheads for future code tracking, IMO. You force people clicking into many links to figure out old review comments and the reason of making changes.

keshavvinayak01 added 4 commits November 26, 2025 04:46

Added pattern rewriter and lit tests for Torch::HigherOrderFlexAttent…

6687498

…ionOp -> LinalgExt::AttentionOp Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

EOL added to lit

d4d4074

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Redundant change (Another PR)

b1feb97

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Added Dynamic Head NYI check

4c66033

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

keshavvinayak01 changed the title ~~[LinalgExt] Added toggle for using useexp2 for onlineAttention Decomposition~~ [LinalgExt] Added toggle for using useExp2 for onlineAttention Decomposition Nov 27, 2025

keshavvinayak01 marked this pull request as ready for review November 27, 2025 06:24

keshavvinayak01 requested review from Groverkss, IanWood1, MaheshRavishankar, Max191 and hanhanW as code owners November 27, 2025 06:24

missing scope resiultion

690d8b0

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

MaheshRavishankar requested changes Dec 5, 2025

View reviewed changes

Groverkss reviewed Dec 15, 2025

View reviewed changes

Groverkss approved these changes Dec 16, 2025

View reviewed changes

compiler/src/iree/compiler/Dialect/LinalgExt/IR/LinalgExtOps.td Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/Passes.td Show resolved Hide resolved

keshavvinayak01 requested a review from MaheshRavishankar December 16, 2025 15:38

MaheshRavishankar approved these changes Dec 22, 2025

View reviewed changes

keshavvinayak01 commented Dec 23, 2025

View reviewed changes

keshavvinayak01 added 8 commits December 23, 2025 08:15

Added CHECK-NOT statements for no func references

c6b9868

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Simplified FileCheck statements

3820c28

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Added verbose comments explaining computeDynamicSizes utility

6eba854

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Replaced Splat with linalg::fill

d0c09b3

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

SplatOp handles dynamic shapes

115116f

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Merge remote-tracking branch 'personal/users/keshavvinayak01/linalgex…

930182d

…t-torch-rewrite-flexattention'

Added toggle for using useexp2 for onlineAttention Decomposition

3c0187e

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

Added useExp2 as pass option to DecomposeAttention

f14f00b

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>

kuhar and others added 14 commits January 20, 2026 05:47

[CI] Add rdna4 / gfx1201 / r9700 tests (#23156)

71c48c8

These should run on shark75-ci.

[Codegen] Fix compiler errors in LinkTuningSpecsPass (#23169)

fc6f680

Pass booleans instead of `nullptr`; the former confuses some compilers because both `bool` and `Value` are constructible with `nullptr`. Also clean up comments and needlessly complicated code just above. Fixes: #23164

[Codegen] fixes typo in assert statement (#23170)

604eb3e

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Fix test failures revealed by reverse iteration (nondeterminism) (#23162

d92664a

) Using #23161

[CI] Enable reverse iteration in UBsan workflow (#23178)

1695fb9

I don't want to add too many CI workflows, so adding together with ubsan.

[CPU] Fix nondeterminism in host cpu features (#23179)

c8d7515

This is hard to test for because only the (dynamic) host feature list is unordered, unlike features for a specific target, and we can't assume a specific host in tests.

Simplify quantifiers all_of, any_of, none_of. NFC. (#23180)

b47101f

* Use `llvm::IsaPred<T>` instead of lambdas where possible * `!any_of` --> `none_of`

keshavvinayak01 force-pushed the users/keshavvinayak01/onlineattention-useexp2-toggle branch from 8b4a278 to b47101f Compare January 20, 2026 05:53

keshavvinayak01 requested review from ScottTodd, Yu-Zhewen, benvanik, bjacob, jtuyls, krzysz00, kuhar, nirvedhmeshram, qedawkins and rsuderman as code owners January 20, 2026 05:53

keshavvinayak01 closed this Jan 20, 2026

keshavvinayak01 deleted the users/keshavvinayak01/onlineattention-useexp2-toggle branch January 20, 2026 06:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LinalgExt] Added toggle for using useExp2 for onlineAttention Decomposition#22778

[LinalgExt] Added toggle for using useExp2 for onlineAttention Decomposition#22778
keshavvinayak01 wants to merge 72 commits intoiree-org:mainfrom
keshavvinayak01:users/keshavvinayak01/onlineattention-useexp2-toggle

keshavvinayak01 commented Nov 27, 2025 •

edited

Loading

Uh oh!

MaheshRavishankar left a comment

Uh oh!

keshavvinayak01 commented Dec 8, 2025

Uh oh!

MaheshRavishankar commented Dec 10, 2025

Uh oh!

Groverkss commented Dec 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Groverkss commented Dec 17, 2025

Uh oh!

keshavvinayak01 left a comment

Uh oh!

MaheshRavishankar commented Jan 20, 2026

Uh oh!

keshavvinayak01 commented Jan 20, 2026

Uh oh!

hanhanW commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Conversation

keshavvinayak01 commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

keshavvinayak01 commented Dec 8, 2025

Uh oh!

MaheshRavishankar commented Dec 10, 2025

Uh oh!

Groverkss commented Dec 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Groverkss commented Dec 17, 2025

Uh oh!

keshavvinayak01 left a comment

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar commented Jan 20, 2026

Uh oh!

keshavvinayak01 commented Jan 20, 2026

Uh oh!

hanhanW commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

keshavvinayak01 commented Nov 27, 2025 •

edited

Loading